Abstract: In the world increasing number of information. This total information has become web based use of HTML files form-based search interfaces. The data units returned from the underlying database are usually encoded into the result pages dynamically for human browsing. For the encoded data units to be machine process able, which is essential for many applications such as deep web data collection and Internet comparison shopping, they need to be extracted out and assigned meaningful labels we present an automatic annotation approach which contains the data units on the web result page into a different groups such that same groups have the same semantic labels. Then the six annotations are combined and predict the final annotation label. An annotation wrapper for the search site is automatically constructed and can be used to annotate new result pages from the same web database. Our experiments indicate that the proposed approach is highly effective.

Keywords: Data alignment, data annotation, web database, wrapper generation.